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American aathropology is in a crisis. Althrough more anthropologists 
are being trained than ever before (about 7000 currently enrolled graduate 
students), traditional employment markets are shrinking. In response, the 
field has turned towards new possibrlifie."? for applied research, but is 
discovering that anthropological theovy often has little problem-solving 
relevance. This need not be so. To cumulatively advance, our insights 
must be tested in a pragmatic arena, where they can be falsified, and if 
necessary, discarded. The evaluation of action programs provides one such 
setting. Not only are the techniques of anthropology essential in proper 
program evaluations; experience in evaluation research will add significantly 
to the growth of anthropological thought. 

THE NATURE OF EVALUATION 

During the I^cO^s, American society developed a deep-rooted faith in 
"action programs a solution to social problems (Williams & Evans 1969). 

A wide range of \:rc.^t^r\zis began, aimed at redistributing power and funds 
to uplift the disadvantaged through education, economics, public health, 
community development, improved ethnic relations, and so on. As the levels 
of commitment grew, hundreds of millions of dollars were spent on HEAD START, 
MODEL CITIES, JOB CORPS, and the like. By the end of the decade, however, 
the federal government, which funded most action programs, demanded better 
information about the return it was getting on its investments, and interest 
in evaluation research grew. 

At this" basic level, evaluation research attempts to determine the degree 
to which a program ha- 'Ts intended effects. Does compensatory education 
improve the cognitive abilities of children? Does mental rehabilitation im- 
prove psychological adjustment? Do new treatments increase the rate of drug 
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addiction cures? Yet, as we shall see, this summative evaluation is not so 
easy as it first appears. 

To be more useful, both to administrators and society, evaluation 
should tell us not only what occurred, but how and why these results were 
obtained. Such process evaluation assesses program design and implementation. 
By elucidating mechanisms through which effects are achieved, we can directly 
verify the theoretical justification for a treatment, and identify possible 
confounding variables within the implementation process. Although process 
evaluation faces even greater methodological problems than the analysis of 
results, it is vital for . an understanding of summative data. 

Formative evaluation not only assesses program implementation, but applies 
the analysis of process and results as recommendations fot improvements in 
program structure. While formative evaluations usually incorporate an analysis 
of process, they are even more of an art, without a rigorous inferential 
grounding. Such evaluations are usually concerned with the establishment of 
a new program, but' could be incorporated in program design as a continuing 
aspect. Indeed, formative evaluation is often the responsibility of program 
management, and is closely identified with internal adminstrat iye goals. 
Formative consultants are ''^ro lem solvers*', who ma;' not strive t;o maintain 
the kind of academic dis-interest that is more characteristic of summative 
and process evaluation. 

The distinction between forms of program evaluation are not hard and 
fast, nor should they be. Studies of design, process and results, all add 
to our understanding of what action programs accomplish and why. However, 
suniir.ative evaluation, despite limited rigorous application, is the dominant 
methodological perspective. The present paper argues t^*at the analysis of 
results alone is insufficient; qualitative procesr ev;?'iatlop poses analytical 
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problems of no greater magnirude and is essential for meaningful interpretation 

QUANTITATIVE EVALUATIONS OF PROGRAM RESULTS 

On first glance, summative program evaluation seems very simple. A 
measurably disadvantaged group is provided with some ameliorative treatment, 
and if its performance has improved upon retesting, then the treatment is 
considered successful. Yet, how can we tell whether observed changes are 
due to the program rather than extraneous factors. An improvement in retest 
scores might be due to the maturation of subjects, to the lessons learned in 
the pre-test experience, or even to a statistical artifact of the test, pro- 
cedure itself. 

Such problems are sometimes very difficult to see intuitively,, but are 
crucial for our assessment of program results. A S;omewh;it sinplified example 
is provided by the' federally funded compensatory education programs that are 
being implemented all over the country. Admission to remedies! "treatment" is 
limited to those students who fall below a minimum cut-off score on a cognitiv 
skills pretest. After a year of program participation, students are tested 
again and n , always a substantial improvement in scores has occurred. 
These stud. are released, and a new crop who fell below a pre-test cut-off 
aire admitted. 

This kind of "evaluation" has been used to validate compensatory educa- 
tion programs and the search for increased funds all over the country. Yet, 
even if we were to accept the testing procedures as adequate, and assume that 
maturation, pre-test experience and Hawthorne effects are irrelevent, c.onclu-- 
sions are still suspect. There are st^istical artifacts which prevent such 
an- "evaluation" from demonstrating the success of remedial treatment in im-. 



proving cognitive abilities. 

The method of program admission — .on the basis of lowest pre-test scores- 
can greatly affect progrjam results. Such admission procedures assume that 
the resutls of the singlje pre-test provide a valid measure of the cognitive 
ability of students, but! in reality a studenL^s score on a given test can 
vary widely due to a raJge of chance factors. This variation is the normal 
distribution of ore-test scores around a mean. By taking only those students 
with the lowest scores into a program, we are emphasizing the downward chance 
variation in test scores. If variation was due to chance, on immediate re-test, 
the group would not repeat original scores, but duplicate the orignaj, popu- 
lation distribution (see diagram 1). In other words, regression to the popu- 
lation mean would lead to a substantial improvement in post-test scores even 
if comt^nsatory education had no impact at all. 

Comparative evaluations have been designed to deal with this problem, 
but there are'Vimportant obstacles to successful inference. The well-known 
West inghouse/t)hio University evaluation of HEAD START provides a good example, 
though one in which statistical artifacts lower rather than raise our estimation 
of program results The researchers were asked to design an ex-post facto 
study study several ' years after the program had begun. They proceeded by 
matching HEAD START participants with outsiders on the basis of cognitive pre- 
tests on a series of performance scales, socio-economic backgrounds and sub- 
sequent educational experience. They then compared scores on a cognitive 
ability post-te5J;^'tTr''&^e^^if^ START participants showed relative improvement. 
The evaluation seemed to clearly indicate that HEAD START students di4 no- 
better than their "untreated" partner, and the effects of a program costing 
hundL'eds of millions of dollars were questioned (Cicirelli, et al. 1969). 
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These results, however, are suspect, because of the difficulties in 
matching pairs (Campbell & Erlebacher 1970). HEAD START programs are 
designed for the most disadvantaged, and usarly all eligible participants 
enrolled. Thus, head start participants can be expected to score lower 
cognitive abilities test than the population at large. The matched pairs, 
howejer, would tend to be those members of the general population who had 
happened to score poorly on the particular pre-test used. The mean score 
of the population from which these individuals came would be significantly 
higher than the mean for disadvantaged students. Furthermore, in a retest, 
each group would regress towards its own population mean. Since the HEAD START 
mean is lower, head start students would tend to show less relative Improvement, 
and the prograir. would seem a failure (see diagram 2). Still unanswered, 

however, is whether even this improvement would have been shown in the 
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absence of the HEAD START program. 

Quantitative methodologists are well aware of the obstacles to proper 
inference — the possibility of concluding that a treatment has an effect,, 
when it really doesn't, and vice versa. Although the warnings are sometimes 
ignored^ a variety of threats to the internal validity of an evaluation have 
been- noted, such as differential maturation of comparison groups, variation 
in measuring instruments, differential mortality in treated and untreated 
groups, and so on (Cook & Campbell 1975). Depending on the fineness of 
distinctions, the list could be expanded to at least 15 or 20 items. 

Sophisticated evaluation designs have increased our ability to distinguish 
real from apparent program* effects (Campbell & Stanley 1966). Although triie 
• experiments in which subjects are randomly assigned to test and control 
groups are the best anser, quasi-experimental designs can provide 
meaningful results provided researchers are aware of the limitations 



(Campbell 1974). Despite continuing examples of inadequate methodology, 
the theoretical basis for methodological sound summative evaluations exists. 
Yet, even so, there are still limits to the utility of ^quantitative assessments 
alone* 

A quantitative evaluation of results is unable to answer many quef.tions 
no matter how good its experimental or quasi-experimental design. It cannot 
tell us how a program is implemented , whether the results are transferable 
to other situations, or why the observed results occurred. Not all of the 
threats to evaluation validity can be met by considering quantitative results. 
Such analysis cannot tell us whether formal program goals actually reflect 
informal ends sought. It cannot tell us if there are differences in the way 
a program is implemented for different individuals or for different locations.. 
It cannot show whether the program has changed over time. It can't elucidate 
subtle effects of implementation in participant selection, differential 
mortality or differential learning. It cannot validate the adequacy of 
testing methods or ascertain diffuse program affects. We must have a broader 
understanding of what a ptogram does, before we can begin to explain why it 
succeeds or fails, 

Summative evaluations treat acMon programs as if they were black boxes, 
. They demonstrate what results have occurred^, but do not elucidate equally 
important how and wh^ questions. To understand these, we must open the black 
box to look at the process of. program impleiaentation and supplement the analysis 
of results. Such research can define the qualitative dimensions about which 
quantitative data can be gathered and provides a further grounding for theoretical 
inference. Moreover, the distinction betwedn quantitative dnd qualitative 
understanding i^s not so great as we make it beem (see Campbell 1975). 



THE EVALUATION OF PROCESS 

While the methodological problems of quantitative evaluation are fairly 
great , obstacles to rigorous qualitative evaluation seem almost cf another 
order of magnitude. Canons of rigor have yet to be established and qualitative 
assessments of process remain as much an art, as a science— albeit, an art 
at which anthropologists are thought to excell. All we can do at present is 
indicate existing problems and suggest appropriate directions for pragmatic 
research. To understand process, we must understand how particular things 
arc processed. In the context of action, programs, we must*Learn how and whyc 
things occur as. an individual progresses through "treatment". The ethnographic 
model of anthropology, with its qualitative assessment of a particular case ^ 
provides a basis from which to start . " . r. . 

Ethnography achieves understandings by combining the knowledge of insiders 
and outsiders. The ethnographer has an external perspective which lets him 
see^e importance in what, an insider, from too great familiarity, dismisses 
as trivial. Still, to interpret his observations, the ethnographer needs a 
a broader context of comparison. In traditional fieldwork, this is provided 
by anthropological theory and a .familiarity with similar regional cultures 
and similar field .experiences. Even so, a consideration of the differences 
and similarities among a range of local sites is often useful. In program 
evaluation, the ethnographer should supplement his disciplinary background, , 
with general knowledge about formal organizations , and about similar programs 
or settings. Lacking a contemporary program for comparison the ethnographer can 
. at least consider sufficient time depth to permit an adequate comparative 
appraisal . 

The mechanics of ethnographic technique are observation and participation, 
of which there are two aspects: On the one hand, the ethnographer as an outside 



specialist tries to place observed events within his own categories of 
relationship, but at the s^me time he tries to understand their import 
for the participants themselves. The most interesting ^partr of^ ethnography 
involves putting these; two views— the external subjectivity of rhe 
observer and the internal subjectivity of the native— together. Both- 
perspectives must bje triangulated with developed theory to provide a 
fairly valid basis for ethnographic inference. 

" Still, th^r-. is ^no guarantee that the interpretation of any 
single ethnographer would be replicated through restudy by- another. 
Unlike summative evaluation, which measures a few wellroperationalized 
variables, the -study of process is concerned with patterns am6ng a 
much larger range of factors. The methodological problems of process 
analysis cannot, at present,, be rigorously solved (see Campbell 1974) . 

■Yet quantification, as such, means very little. It is often the 
qualitative dimensions distinguished by ethnography that provide the 
appropriate basis for quantitative scales. Quantitative assessments, 
mpreover, can tell us little about how and whj[ particular relationships 
exist. • The import of any quantitative analysis of results rests on an 
independent appraisal of the causal relations^ involved. Et\ ' %rap\,y 
tries to comprehend these relationships in their entirety, and' though 
the problem of rigor is severe, the problem of artificiality is nearly 
eliminated. The final purpose of evaluation is pragmatic— the 
improvement of progtam results. An ethnography of process provides 
reasonable insights about how such improvements could be made in a 
way that summative evaluat5.on alone cannot.. 
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PROCESS EVALUATION IN PRACTICE \ 

The best way to illustrate the nature of process evaluation and 
Its problems, is through an example such as my. ongoing assessment of 
the Experimental Technology Incentives Frogranp (ETIP) of the NationaT 
Bureau of Stanciards (NBS) . "ETIP is not atypio^l candidate for social 
science evaluation. It formal mandate, which derives from the President '^s 
1972 Science and Technology message, is to **facilitate technological 
change.'* ETIP seeks to achieve this goal indirectly, by developing 
experimental policy changes in co-operating governmental Organizations 
whigH indirectly "create an environment concjucive to innovation." Despite- 
the lack of tradimonal "clients" or "treatments,'* an assessment of ETIP / 
poses the same questions, as any study 'of process. - ' . 

Last spring, I was asked by "the National Academy of Sciences .(NAS) tfo 
conduct an 18 month 'Evaluation of the ETIP program. The questions at 
issue were ^ ot ETIP'h linwl ^^fi'ects on technological change — these couJd 
be considered later by technologists — but rather an assessment of ETiP 
as a wHole, as an experiment in organizational form within the Tederal 
bureaucracy-; - • . 

Severcil basic questions were raised .by NAS : Can an organization like 
ETIP actually convince federal agencies to experiment with operating 
policies? To what extent, has ETIP been respdnsible for any policy changes 
that have occurred? Do ETIP's policy escperiments reflect mandated program'" 
goals, or the internal needs of co-operating agencies? What factors affect 
ETIPVs success in develdping and implementing experiments? Do, ETJP's 
e:<periments have any civilian, sector- effects? Wliat is ETIP's role in the 
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federal environment? What- mod i ficamons would improve ETIP's ability 
to fulfiVl its mission? All of these issues .concern the effect of an 
organization, and its formal and informal operating processes, on other 
organizations within the same over-arching environment. Although 
anthropologists have' avoided studies bureaucracies and administrative 
elites (despite suggestions that the vacuum be.fijled Ce.g. Foster 1969)), 
these, issues are important to the discipline . 

\. Their resolution, however, is difficult, and must certainTy transcend 
any straight-forward summation of results. Some of the problems involved 



in this kind of evaluation shpuld be ^nen\:ioncd"r It is difficult to conduct 
a real-Lime appraisal of' ETIP, since it ls-,a ""dynamically evolving program 
whose goals, clients, projects and personnel chan^ during the qourse 
of Study. Because the program is still developing there is a lack of 
data on its civilian sector effects, and summative evaluations of 
particular FTIP projects will not be available ^intil late in, 1977. 
Furthermore, it is very hard to measure many of the ephemeral effects, 
such as changes in agency policy, which are the direct concern of the 
study. Finally, there is nobaseline data — similar programs , which could 
be used for Comparison. ' In general, the possibilities for truely 
rigorous inference are" limit,ed» Still, since the research provided an 
opp.prtunity Xo' study American social' .and administrative organization at a 
level, that is rarely possible, I decided' to go ahead. 

'At the time of my introduction to, ETIP, the program had a. staff of 

■■ ■ . - > ' . 

12 and was -conducting more than 100 projects with over 40 federal agei\cy 

and private clients. Materials on a single proj ect» sometimes filled 

an entire file 'drawer..^. Often^4 or 5 co-operating offices wer6- involved , 

,'. • * 

■ ■ 1-2. .■ - . 



11 



and more 'than a score of individuals. Faced with access to so much data, 
I first had to sit back and develop a strategy with which to proceed. My 
background in anthropological ethnography provided , I think, the best 
starting point. 

The first task was to reach a better understanding about the 
program itself, by conducting an ethnography of ETIP which could answer 
a whole series of "who", ''what'', **where", and >ow»' questians . \ Since 
internal documentary evidence seemed suspiciously one-sided, I searched 

r. \ 

for as many different sources of data as possible. Only ^fter determining 
precisely what ETIP did could I turn to the program's relationships with 
client agencies. 

The statements of goals and procedures found in- formal documents were 
supplemented by an analysis of entire stacks of bureaucratic paperwork— 
memos, schedules, budgets-- which put the formal evidence in another light. 
A whole range of informal documents about particular ETIP projects provided: 
a basis for * quant if ied' measures of project type and project success. ^ Most 
important, however, were the interviews, which were conducted with both 
current and former staff members through a variety of techniques. Only 
when .-information from all these sources was combined 'and analyzed, could, 
a clear idea of ETIP ' s operat ions be developed. Although many 'details 
must await further confirmation, ETIP is clearly a rather different . 
organization in practice than it appears in formal design, with goals that 
are often far removed from any issues of "technological change." 

Th^is last fact raises an important problem.' Althpugh ETIP- 3ome times 
seems to go beyond its formal mandate, many such activities are both 
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useful and successful. What, however, should be the goals against 
which the. programs success is measured? 

The issue is difficult, for any action program involves a diversity 
of interests whose goals and implementation policies differ (Krause & 
Howard 1975). An ethnography of the implementation process, at least, 
can begin to show how formal goals in an action program are modified 
in practicec^ The definition of such informal goals cannot be obtained 
through summative evaluation alone. On the other hand, the qualitative 
appraisal of ETIP unearthed a number of dimensions (e.g. project typ6, 
project objectives, staff commit tment , -etc . ) for which appropriate 
quantitative measures were' developed and analyzed. 

Questions about goal orientation are even more important in the 
current stage of research, the study of ETIP/agency relations. The 
implementa.t ion of action programs is a political process (Krause & 
Howard 1975), and differences between ETIP and its agency clients are 
to be anticipated. To what extent, though, do differences in goal 
orientation, funcing priorities, or implementation, procedures. affect 
the success of the ETIP progr am? 

To find put, a series of case studies are being investigated covering 
the range of project types discovered in the earlier ethnography. Again 
a multi-method approach is being used to provide as much diversity in 
the sources of data as possible. These in-depth case studies are expected 
to define the parameters for a more rapid survey of other ETIP/agency 
relationships and of civilian sector responses to particular ETIP 
projects. 

14 



13 



In the last stage of research all sources of data will be comined, and 
lacunae located and filled. An analysis of both qualitative and 
quantitative relationships will proceed. While the final result will 
not be an entirely rigorous assessment of ETIP's performance, an analytical 
background for later interpretation will have been found. 

CONCLUSIONS 

Process evaluations of action programs are closely akin to anthropological 
ethnographies.. The researcher must integrate a diverse body of data, 
encompassing outsider and insider points of view, in order to explicate 
social process. The outcome is an in-depth undersanding which enables the 
definition and interpretation of appropriate quantitative measures. While 
the objective validity of such an ethnography is difficult to measure, 
some level of validity does exist, for an ethnographer cannot simply 
manipulate variables ex post facto to substantiate his conclusions (Campbell 
1974). The ethnographer is concerned with patterns, and any particular- 
hypothesise has multiple implications which must be demonstrated. Moreover, 
a good ethnographer must successfully account for similarities and differ- 
ences between the observers and the native's point of view. 

Although ethhographic techniques are still imperfect, without them only 
common-sense would be available to help interpret quantitative measures. 
Certainly, a thorough-going ethnography provides a better guide. Experience 
has taught us that social systems often behave in a connter-intuitive 
fashion (Forrester .1968), and. common-sense alone provides a limit basis : 
for the design of action programs. Qualitative assessments of program 
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processes are essential in interpreting summative results and desig!»ing 
program improvements. 

Still, even if ethnographic techniques are important to action 
research, why should anthropologists be involved? Mundane facts :ould 
be cited, such as the-need for solutions to practical problems, or the 
job-shortage in academia, but evaluation research ^has a theoretical 
relevance to anthropology well. 

The focus of this symposium is the place of theory in problem oriented 
anthropology. Thus far, anthropological research has not yielded a 
cumulative growth of theory. But our discipline does not face this 
problem alone. Recent critics have noted the general deficiencies of 
social science understandings of human behavior (Gordon & Morse 1975). The 
major problem facing evaluation research is not inadequate methocloiogy — 
new tools can be developed — but a lack of appropriate middle level 
theory. Yet evaluation research, the assessment of action programs, has 
an enormous potential for developing such theory. Its findings are 
applied as social policies, and the validity of conclusions is subject to 
rapid real world testinp^. 

Problem oriented research, such as program evaluation, is crucial to 
anthropology. Theories can only be proved if they are applied to concrete 
situations where they can also be falsified. Applied anthropology is not 
a poor relation to the mainstream, but must Jead in the development of 
new understandings of man's place in the world. Evaluation research provi«ies not 
merely a new employment option, hut an opportunity to re-integrate 
anthropological theory and practice as well. 
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proving cognitive abilities. 

The method of program admission — .on the basis of lowest pre-test scores- 
can greatly affect progrjam results. Such admission procedures assume that 
the resutls of the singlje pre-test provide a valid measure of the cognitive 
ability of students, but! in reality a student's score on a given test can 
vary widely due to a raJge of chance factors. This variation is the normal 
distribution of ore-test scores around a mean. By taking only those students 
with the lowest scores into a program, we are emphasizing the downward chance 
variation in test scores. If variation was due to chance, on immediate re-test, 
the group would not repeat original scores, but duplicate the orignaj, popu- 
lation distribution (see diagram 1). In other words, regression to the popu- 
lation mean would lead to a substantial improvement in post-test scores even 
if comr^nsatory education had no impact at all. 

Comparative evaluations have been designed to deal with this problem, 
but there are'Vimportant , obstacles to successful inference. The well-known 
West inghouse/t)hio University evaluation of HEAD START provides a good example, 
though one in which statistical artifacts lower rather than raise our estimation 
of program results . ""The researchers were asked to design an ex-post facto 
study study several ' years after the program had begun. They proceeded by 
matching HEAD START participants with outsiders on the basis of cognitive pre- 
tests on a series of performance scales, socio-economic backgrounds and sub- 
sequent educational experience. They then compared scores on a cognitive 
ability post-te5J;^'rD"''&^e^^if^E START participants showed relative improvement. 
The evaluation seemed to clearly indicate that HEAD START students di4 no- 
better than their "untreated" partner, and the effects of a program costing 
hundreds of millions of dollars were questioned (Cicirelli, et al. 1969). 



: These results, however, are suspect, because of the difficulties in 
matching pairs (Campbell & Erlebacher 1970). HEAD START programs are 
designed for the most disadvantaged, and usarly all eligible participants 
are enrolled. Thus, head start participants can be expected to score lower 
on a; cognitive abilities test than the population at large. The matched pairs, 
howeyer, would tend to be those members of the general population who had 
happened to score poorly on the particular pre-test used. The mean score 
of the population from which these individuals came would be significantly 
higher than the mean for disadvantaged students. Furthermore, in a retest, 
each group would regress towards its own population mean. Since the HEAD START 
mean is lower, head start students would tend to show less relative Improvement, 
and the prograir. would seem a failure (see diagram 2). Still unanswered, 

however, is whether even this improvement would have been shown in the 

/ / 

absence of the HEAD START program. 

Quantitative methodologists are well aware of the obstacles to proper 
inference — the possibility of concluding that a treatment has an effect,. 

' when it really doesn't, and vice versa. Although the warnings are sometimes 
ignored^ a variety of threats to the internal validity of an evaluation have 
been- noted, such as differential maturation of comparison groups, variation 
in measuring instruments, differential mortality in treated and untreated 
groups, and so on (Cook & Campbell 1975). Depending on the fineness of 
distinctions, the list could be expanded to at least 15 or 20 items. 

Sophisticated evaluation designs have increased our ability to distinguish 
real from apparent program" effects (Campbell & Stanley 1966). Although true 

- experiments in which subjects are randomly assigned to test and control 
groups are the best anser, quasi-experimental designs can provide 
meaningful results provided researchers are aware of the limitations 



(Campbell 1974). Despite continuing examples of inadequate methodology, 
the theoretical basis for methodological sound summative evaluations exists. 
Yet, even so, there are still limits to tha utility of ^quantitative assessments 
alone « 

A quantitative evaluation of results is unable to answer many quef.tions 
no matter how good its experimental or quasi-experimental design. It cannot 
tell us how a program is implemented , whether the results are transferable 
to other situations, or why the observed results occurred. Not all of the 
threats to evaluation validity can be met by considering quantitative results. 
Such analysis cannot tell us whether formal program goals actually reflect 
informal er.ds sought. It cannot tell us if there are differences in the way 
a program is implemented for different individuals or for different locations.. 
It cannot show whether the program has changed over time. It can't elucidate 
subtle effects of implementation in participant selection, differential 
mortality or differential learning. It cannot validate the adequacy of 
testing methods or ascertain diffuse program affects. We must have a broader 
understanding of what a ptogram does, before we can begin to explain why it 
succeeds or fails, 

Summative evaluations treat acMon programs as if they were black boxes. 
They demonstrate what results have occurred^, but do not elucidate equally 
important how and wh^ questions. To understand these, we must open the black 
box to look at the process of. program implelaentation and supplement the analysis 



of results. Such research can define the qua 
quantitative data can be gathered and provic 
inference. Moreover, the distinction betweejn 
understanding i^s not so great as we make it 
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litative dimensions about which 
s a further grounding for theoretical 

quantitative aind qualitative 
seem (see Campbell 1975). 



THE EVALUATION OF PROCESS 

While the methodological problems of quantitative evaluation are fairly 
great, obstacles to rigorous qualitative evaluation seem almost cf another 
order of magnituda. Canons of rigor have yet to be established and qualitative 
assessments of process remain as much an art, as a science— albeit, an art 
at which anthropologists are thought to excell. All we can do at present is 
indicate existing problems and suggest appropriate directions for pragmatic 
research. To understand process, we must understand how particular things 
arc processed. In the context of action, programs, we must* Learn how and whyc 
things occur as. an individual progresses through "treatment". The ethnographic 
model of anthropology, with its qualitative assessment of a particular case ^ 
provides a basis from which to start . ^ . . 

Ethnography achieves understandings by combining the knowledge of insiders 
and outsiders. The ethnographer has an external perspective which lets him 
see^e importance in what, an insider, from too great familiarity, dismisses 
as trivial. Still, to interpret his observations, the ethnographer needs a 
a broader context of comparison. In traditional fieldwork, this is provided 
by anthropological theory and a .familiarity with similar regional cultures 
and rimilar field .experiences. Even so, a consideration of the differences 
and similarities among a range of local sites is often useful. In program 
evaluation, the ethnographer should supplement his disciplinary background, , 
with general knowledge about formal organizations , and about similar programs 
or settings. Lacking a contemporary program for comparison the ethnographer can 
. at least consider sufficient time depth to permit an adequate comparative 
appraisal . 

The mechanics of ethnographic technique are observation and participation, 
of which there are two aspects: On the one hand, the ethnographer as an outside 



specialist tries to place observed events within his own categories of 
relationship, but at the s^me time he tries to understand their import 
for the participants themselves. The most interesting ""part: of^ ethnography 
involves putting these; two views— the external subjectivity of rhe 
observer and the internal subjectivity of the native— together. Both- 
perspectives must bje triangulated with developed theory to provide a 
fairly valid basis for ethnographic inference. 

" Still, th^r-. is rno guarantee that the interpretation of any 
single ethnographer would be replicated through restudy by- another. 
Unlike summative evaluation, which measures a few wellroperationalized 
variables, the -study of process is concerned with patterns am6ng a 
much larger range of factors. The methodological problems of process 
analysis cannot, at present,, be rigorously solved (see Campbell 1974) . 

■Yet quantification, as such, means very little. It is often the 
qualitative dimensions distinguished by ethnography that provide the 
appropriate basis for quantitative scales. Quantitative assessments, 
mpreover, can tell us little about how and whj^ particular relationships 
exist. • The import of any quantitative analysis of results rests on an 
independent appraisal of the causal relations^ involved. Et\ ' graphy 
tries to comprehend these relationships in their entirety, and' though 
the problem of rigor is severe, the problem of artificiality is nearly 
eliminated. The final purpose of evaluation is pragmatic— the 
improvement of progtam results. An ethnography of process provides 
reasonable insights about how such improvements could be made in a 
way that summative evaluat5.on alone cannot.. 
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PROCESS EVALUATION IN PRACTICE \ 

The best way to illustrate the nature of process evaluation and 
its problems, is through an example such as my. ongoing assessment of 
the Experimental Technology Incentives FrogranF (ETIP) of the NationaX 
Bureau of Stanciards (NBS) . "ETIP is not atypio^l candidate for social 
science evaluation. It formal mandate, which derives from the President's 
1972 Science and Technology message, is to **facilitate technological 
change,'* ETIP seeks to achieve this goal indirectly, by developing 
experimental policy changes in co-operating governmental Organizations 
whigH indirectly "create an environment concjucive to innovation." Despite - 
the lack of tradimonal "clients" or "treatments,'* an assessment of ETIP / 
poses the same questions, as any study'of process.' - ' . 

Last spring, I was asked by "the National Academy of Sciences .(NAS) tfo 
conduct an 18 month '^valuat ion of the ETIP program. The questions at 
issue were ^ ot ETIP'h iinwl t^ffects on technological change — these cou3d 
be considered later by technologists — but rather an assessment of ETiP 
as a wHoie, as an experiment in organizational form within the Tederal 
bureaucracy-: - * . 

Severcji basic questions were raised »by NAS : Can an organization like 
ETIP actually convince federal agencies to experiment with operating 
policies? To what extent, has ETIP been responsible for any policy changes 
that have occurred? Do ETIP's policy escperiments reflect mandated program ' 
goals, or the internal needs of co-operating agencies? What factors affect 
ETIPVs success in devel6.ping and implementing experiments? Do, ETJP's 
experiments have any civilian, sector- effects? Wliat is ETIP's role in the 
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federal environment? What- mod i ficaMons would improve ETIP's ability 
to fulfiVl its mission? All of these iss.ues .jconcern the effect of an 
organization, and its formal and informal operating processes, on other 
organizations within the same over-arching environment. Although 
anthropologists have' avoided studies bureaucracies and administrative 
elites (despite suggestions that the vacuum be.fijled Ce.g. Foster 1969)), 
these, issues are important to th*a discipline . 

\. Their resolution, however, is difficult, and must certainTy transcend 
any straight-forward summation of results. Some of the problems involved 



in this kind of evaluation shpuld be ^nentioncd-r It is difficult to conduct 
a real-time appraisal of' ETIP, since it Is- a ^dynamically evolving program 
whose goals, clients, projects and personnel chan^ during the qourse 
of Study. Because the program is still developing there is a lack of 
data on its civilian sector effects, and summative evaluations of 
particular FTIP projects will not be available ^intil late in, 1977. 
Furthermore, it is very hard to measure many of the ephemeral effects, 
such as changes in agency policy, which are the direct concern of the 
study. Finally, there is nobaseline data — similar programs , which could 
be used for Comparison. ' In general, the possibilities for truely 
rigorous inference are" limit,ed» Still, since the research provided an 
opp.prtunity Xo"'study American social' .and administrative organization at a 

level, that is rarely possible, I decided' to go ahead. 

^ ■ t ' ■• . * • ■ ' ■ . ^ * 

' At the time of my introduction to, ETIP, the program had a. staff of 

■■ ■ . - > ... ' . 

12 and was -conducting more than 100 projects with over 40 federal agei\cy 

and private clients. Materials on a single proj ect» sometimes filled 

an entire file 'drawer.,^. Often^4 or 5 co-operating offices wer6- involved , 
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